This report details the exploration of modeltime

A Forecasting Workflow

The process of producing forecasts for time series data in general can be broken down into a few steps.

A Visual Analytics Application

  • The Visualize, Specify, Estimate and Evaluate steps form an iterative process which requires the forecaster to perform repeated cycles of calculated trial and error in order to achieve a good result. The Shiny Visual Analytics Application (VAA) will utilize graphs to explore the data, analyze the validity of the models fitted and present the forecasting results. By providing the user with an interface to tune and visualize models, the application will enable forecasters to easily experiment with different algorithms without the need to write codes and scripts.

Benefits of modeltime

  • Integrate closely with the tidyverse collection of packages, particularly modeltime lets user taps into the machine learning ecosystem of tidymodels, particularly the parsnip models.
  • Evaluation, combining multiple models
  • Use
  • easily create a plethora of forecasting models by combining modeltime and parsnip
  • Hybrid models

Time series forecasting with Modeltime

Loading the packages

  • tidyverse
  • lubridate
  • timetk
  • modeltime
  • tidymodels
packages <- c('tidyverse', 'lubridate', 'timetk', 'modeltime', 'tidymodels', 'tidyquant', 'glmnet', 'randomForest', 'earth')

for (p in packages){
  if (!require(p,character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

Loading the data

The objective is to make the application stock agnostic, meaning we can use the application with any stock and not just any specific ticker. For the purpose of this report, we will be using Apple’s stock (AAPL)

stock <- tq_get("AAPL", get = "stock.prices", from = " 2020-01-01")

We’re only interested in the closing price from the start of this year, 2021

Data preprocessing, including doing differencing on the data to remove the trends and the time series data

stock_tbl <- stock %>%
  select(date, close) %>%
  filter(date >= "2021/01/01") %>%
  set_names(c("date", "value")) %>%
  mutate(value = diff_vec(value)) %>%
  mutate(value = replace_na(value, 0))
## diff_vec(): Initial values: 129.410004
stock_tbl
## # A tibble: 67 x 2
##    date        value
##    <date>      <dbl>
##  1 2021-01-04  0    
##  2 2021-01-05  1.60 
##  3 2021-01-06 -4.41 
##  4 2021-01-07  4.32 
##  5 2021-01-08  1.13 
##  6 2021-01-11 -3.07 
##  7 2021-01-12 -0.180
##  8 2021-01-13  2.09 
##  9 2021-01-14 -1.98 
## 10 2021-01-15 -1.77 
## # ... with 57 more rows

Plotting the stock’s historical data

stock_tbl %>%
  plot_time_series(date, value, .interactive = TRUE)

Split the data, using the data from the last 3 weeks as validation data

splits <- stock_tbl %>%
  time_series_split(assess = "3 weeks", cumulative = TRUE)
## Using date_var: date
splits %>%
  tk_time_series_cv_plan() %>%
  plot_time_series_cv_plan(date, value, .interactive = TRUE)

Since the date column is meaningless to the machine learning models, some features engineering are required to convert the data format. THis can be done using the step_timeseries_signature from modeltime, which returns a recipe object. With it, we can perform additional recipe functions such as Removing unused columns and creating dummy variables for categorical features

Parsnip models need to convert Date column to ID

recipe_spec <- recipe(value ~ date, training(splits)) %>%
  step_timeseries_signature(date) %>%
  step_rm(contains("am.pm"), contains("hour"), contains("minute"),
          contains("second"), contains("xts"), contains("half"),
          contains(".iso")) %>%
  #step_mutate(value = replace_na(value, 0)) %>%
  step_normalize(date_index.num) %>%
  #step_fourier(date, period = 365, K = 5) %>%
  step_dummy(all_nominal())
  
recipe_spec_parsnip <- recipe_spec %>%
  update_role(date, new_role = "ID")

bake(recipe_spec %>% prep(), new_data = NULL)
## # A tibble: 53 x 34
##    date        value date_index.num date_year date_quarter date_month date_day
##    <date>      <dbl>          <dbl>     <int>        <int>      <int>    <int>
##  1 2021-01-04  0              -1.65      2021            1          1        4
##  2 2021-01-05  1.60           -1.61      2021            1          1        5
##  3 2021-01-06 -4.41           -1.57      2021            1          1        6
##  4 2021-01-07  4.32           -1.52      2021            1          1        7
##  5 2021-01-08  1.13           -1.48      2021            1          1        8
##  6 2021-01-11 -3.07           -1.34      2021            1          1       11
##  7 2021-01-12 -0.180          -1.30      2021            1          1       12
##  8 2021-01-13  2.09           -1.26      2021            1          1       13
##  9 2021-01-14 -1.98           -1.21      2021            1          1       14
## 10 2021-01-15 -1.77           -1.17      2021            1          1       15
## # ... with 43 more rows, and 27 more variables: date_wday <int>,
## #   date_mday <int>, date_qday <int>, date_yday <int>, date_mweek <int>,
## #   date_week <int>, date_week2 <int>, date_week3 <int>, date_week4 <int>,
## #   date_mday7 <int>, date_month.lbl_01 <dbl>, date_month.lbl_02 <dbl>,
## #   date_month.lbl_03 <dbl>, date_month.lbl_04 <dbl>, date_month.lbl_05 <dbl>,
## #   date_month.lbl_06 <dbl>, date_month.lbl_07 <dbl>, date_month.lbl_08 <dbl>,
## #   date_month.lbl_09 <dbl>, date_month.lbl_10 <dbl>, date_month.lbl_11 <dbl>,
## #   date_wday.lbl_1 <dbl>, date_wday.lbl_2 <dbl>, date_wday.lbl_3 <dbl>,
## #   date_wday.lbl_4 <dbl>, date_wday.lbl_5 <dbl>, date_wday.lbl_6 <dbl>

Creating and fit an ARIMA model using modeltime

model_fit_arima <- arima_reg() %>%
  set_engine("auto_arima") %>%
  fit(value ~ date, training(splits))

model_fit_arima
## parsnip model object
## 
## Fit time:  271ms 
## Series: outcome 
## ARIMA(0,0,0)(1,0,0)[5] with zero mean 
## 
## Coefficients:
##          sar1
##       -0.3853
## s.e.   0.1307
## 
## sigma^2 estimated as 6.399:  log likelihood=-124.29
## AIC=252.58   AICc=252.82   BIC=256.52

Similarly, we use modeltime to create and train a Prophet model

workflow_fit_prophet <- workflow() %>%
  add_model(
    prophet_reg() %>% set_engine("prophet")
  ) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))
## Disabling yearly seasonality. Run prophet with yearly.seasonality=TRUE to override this.
## Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
workflow_fit_prophet
## == Workflow [trained] ==========================================================
## Preprocessor: Recipe
## Model: prophet_reg()
## 
## -- Preprocessor ----------------------------------------------------------------
## 4 Recipe Steps
## 
## * step_timeseries_signature()
## * step_rm()
## * step_normalize()
## * step_dummy()
## 
## -- Model -----------------------------------------------------------------------
## PROPHET w/ Regressors Model
## - growth: 'linear'
## - n.changepoints: 25
## - changepoint.range: 0.8
## - yearly.seasonality: 'auto'
## - weekly.seasonality: 'auto'
## - daily.seasonality: 'auto'
## - seasonality.mode: 'additive'
## - changepoint.prior.scale: 0.05
## - seasonality.prior.scale: 10
## - holidays.prior.scale: 10
## - logistic_cap: NULL
## - logistic_floor: NULL
## - extra_regressors: 30

Using the new features, we can start creating and training machine learning models, starting with linear regression model. Setting the mixture variable will allow us to configure the model to be Ridge, Lasso or ElasticNet. Here, we’ll use ElasticNet for our experiment.

model_spec_glmnet <- linear_reg(penalty = 0.01, mixture = 0.5) %>%
  set_engine("glmnet")

workflow_fit_glmnet <- workflow() %>%
  add_model(model_spec_glmnet) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))

A random forest model can be setup similarly to the linear regression model.

model_spec_rf <- rand_forest(trees = 500, min_n = 50) %>%
  set_engine("randomForest")

workflow_fit_rf <- workflow() %>%
  add_model(model_spec_rf) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))

Model for XGBoost

workflow_fit_xgboost <- workflow() %>%
  add_model(
    boost_tree() %>% set_engine("xgboost")
  ) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))

workflow_fit_xgboost
## == Workflow [trained] ==========================================================
## Preprocessor: Recipe
## Model: boost_tree()
## 
## -- Preprocessor ----------------------------------------------------------------
## 4 Recipe Steps
## 
## * step_timeseries_signature()
## * step_rm()
## * step_normalize()
## * step_dummy()
## 
## -- Model -----------------------------------------------------------------------
## ##### xgb.Booster
## raw: 35.9 Kb 
## call:
##   xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
##     colsample_bytree = 1, min_child_weight = 1, subsample = 1, 
##     objective = "reg:squarederror"), data = x$data, nrounds = 15, 
##     watchlist = x$watchlist, verbose = 0, nthread = 1)
## params (as set within xgb.train):
##   eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", min_child_weight = "1", subsample = "1", objective = "reg:squarederror", nthread = "1", validate_parameters = "TRUE"
## xgb.attributes:
##   niter
## callbacks:
##   cb.evaluation.log()
## # of features: 32 
## niter: 15
## nfeatures : 32 
## evaluation_log:
##     iter training_rmse
##        1      2.260684
##        2      1.881258
## ---                   
##       14      0.448955
##       15      0.409669

The last machine learning model to be tested is SVM

model_spec_svm <- svm_rbf() %>% 
  set_engine("kernlab") %>% 
  set_mode("regression") %>%
  translate()
  
workflow_fit_svm <- workflow() %>%
  add_model(
    svm_rbf() %>% 
    set_engine("kernlab") %>%
    set_mode("regression")
  ) %>%
  add_recipe(recipe_spec_parsnip) %>%
  fit(training(splits))
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.

Boosted ARIMA model

workflow_fit_arima_boosted <- workflow() %>%
  add_model(
    arima_boost(min_n = 2, learn_rate = 0.015) %>%
    set_engine(engine = "auto_arima_xgboost")
  ) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))

A hybrid model of Prophet and XGBoost is also supported by modeltime

model_spec_prophet_boost <- prophet_boost(seasonality_weekly = FALSE,
                                          seasonality_daily =  FALSE,
                                          seasonality_yearly = FALSE) %>%
  set_engine("prophet_xgboost") 

workflow_fit_prophet_boost <- workflow() %>%
  add_model(model_spec_prophet_boost) %>%
  add_recipe(recipe_spec) %>%
  fit(training(splits))

workflow_fit_prophet_boost
## == Workflow [trained] ==========================================================
## Preprocessor: Recipe
## Model: prophet_boost()
## 
## -- Preprocessor ----------------------------------------------------------------
## 4 Recipe Steps
## 
## * step_timeseries_signature()
## * step_rm()
## * step_normalize()
## * step_dummy()
## 
## -- Model -----------------------------------------------------------------------
## PROPHET w/ XGBoost Errors
## ---
## Model 1: PROPHET
##  - growth: 'linear'
##  - n.changepoints: 25
##  - changepoint.range: 0.8
##  - yearly.seasonality: 'FALSE'
##  - weekly.seasonality: 'FALSE'
##  - daily.seasonality: 'FALSE'
##  - seasonality.mode: 'additive'
##  - changepoint.prior.scale: 0.05
##  - seasonality.prior.scale: 10
##  - holidays.prior.scale: 10
##  - logistic_cap: NULL
##  - logistic_floor: NULL
## 
## ---
## Model 2: XGBoost Errors
## 
## xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
##     colsample_bytree = 1, min_child_weight = 1, subsample = 1, 
##     objective = "reg:squarederror"), data = x$data, nrounds = 15, 
##     watchlist = x$watchlist, verbose = 0, nthread = 1)

Once all the models have been fitted, they will be added into a Model Table using modeltime_table for an easy way to visualize and evaluate the performance of the models as well as do forecasting on all models at once.

model_table <- modeltime_table(
  model_fit_arima,
  workflow_fit_arima_boosted,
  workflow_fit_prophet,
  workflow_fit_prophet_boost,
  #model_fit_ets,
  workflow_fit_glmnet,
  workflow_fit_rf,
  workflow_fit_svm,
  workflow_fit_xgboost
) 

model_table
## # Modeltime Table
## # A tibble: 8 x 3
##   .model_id .model     .model_desc                                            
##       <int> <list>     <chr>                                                  
## 1         1 <fit[+]>   ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN                  
## 2         2 <workflow> ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN W/ XGBOOST ERRORS
## 3         3 <workflow> PROPHET W/ REGRESSORS                                  
## 4         4 <workflow> PROPHET W/ XGBOOST ERRORS                              
## 5         5 <workflow> GLMNET                                                 
## 6         6 <workflow> RANDOMFOREST                                           
## 7         7 <workflow> KERNLAB                                                
## 8         8 <workflow> XGBOOST
calibration_table <- model_table %>%
  modeltime_calibrate(testing(splits))

calibration_table
## # Modeltime Table
## # A tibble: 8 x 5
##   .model_id .model    .model_desc                         .type .calibration_da~
##       <int> <list>    <chr>                               <chr> <list>          
## 1         1 <fit[+]>  ARIMA(0,0,0)(1,0,0)[5] WITH ZERO M~ Test  <tibble [14 x 4~
## 2         2 <workflo~ ARIMA(0,0,0)(1,0,0)[5] WITH ZERO M~ Test  <tibble [14 x 4~
## 3         3 <workflo~ PROPHET W/ REGRESSORS               Test  <tibble [14 x 4~
## 4         4 <workflo~ PROPHET W/ XGBOOST ERRORS           Test  <tibble [14 x 4~
## 5         5 <workflo~ GLMNET                              Test  <tibble [14 x 4~
## 6         6 <workflo~ RANDOMFOREST                        Test  <tibble [14 x 4~
## 7         7 <workflo~ KERNLAB                             Test  <tibble [14 x 4~
## 8         8 <workflo~ XGBOOST                             Test  <tibble [14 x 4~
calibration_table %>%
  modeltime_forecast(actual_data = stock_tbl) %>%
  plot_modeltime_forecast(.interactive = TRUE)
calibration_table %>%
  modeltime_accuracy() %>%
  table_modeltime_accuracy(.interactive = FALSE)
Accuracy Table
.model_id .model_desc .type mae mape mase smape rmse rsq
1 ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN Test 1.78 118.43 1.00 162.96 2.15 0.09
2 ARIMA(0,0,0)(1,0,0)[5] WITH ZERO MEAN W/ XGBOOST ERRORS Test 1.63 107.67 0.91 137.55 1.98 0.04
3 PROPHET W/ REGRESSORS Test 2.03 171.10 1.14 167.00 2.42 0.06
4 PROPHET W/ XGBOOST ERRORS Test 3.26 412.22 1.83 179.65 3.65 0.00
5 GLMNET Test 1.71 147.62 0.96 134.49 2.11 0.15
6 RANDOMFOREST Test 1.87 182.14 1.05 159.34 2.26 0.05
7 KERNLAB Test 1.82 142.74 1.02 175.55 2.12 0.22
8 XGBOOST Test 3.22 416.91 1.81 171.43 3.59 0.00

Refit and Forecast Forward

calibration_table %>%
  modeltime_refit(stock_tbl) %>%
  modeltime_forecast(h = "2 weeks", actual_data = stock_tbl) %>%
  plot_modeltime_forecast(.interactive = TRUE)

Dashboard design

Reference